Developments in the TIGER Annotation Scheme and their Realization in the Corpus
نویسندگان
چکیده
This paper presents the annotation of the German TIGER Treebank. First, issues concerning the annotation, representation as well as querying of the treebank are discussed. Within this context, the annotation tool ANNOTATE, the export and XML formats of the TIGER Treebank and the TIGER search tool are briefly introduced. Secondly, the developments of the TIGER annotation scheme and their realization in the corpus are introduced focussing on the differences between the underlying NEGRA annotation scheme and the further developed TIGER annotation scheme. The main differences are concerned with verb-subcategorization, coordination, appositions and parentheses as well as proper nouns. Thirdly, the annotation scheme is assessed through an evaluation and a problem discussion of the above mentioned changes. For this purpose, inter-annotator agreement in the TIGER project has been analyzed focussing on exactly these changes. This analysis shows where the annotators’ decision problems are. These difficulties are discussed in greater detail on the basis of annotation examples. The paper concludes with some suggestions for the improvement of the TIGER annotation scheme.
منابع مشابه
An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملA CAD System Framework for the Automatic Diagnosis and Annotation of Histological and Bone Marrow Images
Due to ever increasing of medical images data in the world’s medical centers and recent developments in hardware and technology of medical imaging, necessity of medical data software analysis is needed. Equipping medical science with intelligent tools in diagnosis and treatment of illnesses has resulted in reduction of physicians’ errors and physical and financial damages. In this article we pr...
متن کاملThe SALSA Corpus: a German Corpus Resource for Lexical Semantics
This paper describes the SALSA corpus, a large German corpus manually annotated with role-semantic information, based on the syntactically annotated TIGER newspaper corpus (Brants et al., 2002). The first release, comprising about 20,000 annotated predicate instances (about half the TIGER corpus), is scheduled for mid-2006. In this paper we discuss the frame-semantic annotation framework and it...
متن کاملSemantic Annotation for Generation: Issues in annotating a corpus to develop and evaluate discourse entity realization algorithms
We are annotating a corpus with information relevant to discourse entity realization, and especially the information needed to decide which type of NP to use. The corpus is being used to study correlations between NP type and certain semantic or discourse features, to evaluate hand-coded algorithms, and to train statistical models. We report on the development of our annotation scheme, the prob...
متن کاملA Declarative Formalism for Constituent-to-Dependency Conversion
In this paper, we present a declarative formalism for writing rule sets to convert constituent trees into dependency graphs. The formalism is designed to be independent of the annotation scheme and provides a highly task-related syntax, abstracting away from the underlying graph data structures. We have implemented the formalism in our search tool and used a preliminary version to create a rule...
متن کامل